{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "EheA5_j_cEwc"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Probability Authors.\n",
        "\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\");"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "YCriMWd-pRTP"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\"); { display-mode: \"form\" }\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "OvRwFTkqcp1e"
      },
      "source": [
        "# Optimizers in TensorFlow Probability\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/probability/examples/Optimizers_in_TensorFlow_Probability\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Optimizers_in_TensorFlow_Probability.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Optimizers_in_TensorFlow_Probability.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/probability/tensorflow_probability/examples/jupyter_notebooks/Optimizers_in_TensorFlow_Probability.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hiBI9YkYoVBO"
      },
      "source": [
        "## Abstract\n",
        "\n",
        "In this colab we demonstrate how to use the various optimizers implemented in TensorFlow Probability."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CWQZUqnMf-3A"
      },
      "source": [
        "## Dependencies & Prerequisites"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "2nA2FSdTgcEM"
      },
      "outputs": [],
      "source": [
        "#@title Import { display-mode: \"form\" }\n",
        "\n",
        "%matplotlib inline\n",
        "\n",
        "\n",
        "import contextlib\n",
        "import functools\n",
        "import os\n",
        "import time\n",
        "\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import scipy as sp\n",
        "from six.moves import urllib\n",
        "from sklearn import preprocessing\n",
        "\n",
        "import tensorflow.compat.v2 as tf\n",
        "tf.enable_v2_behavior()\n",
        "\n",
        "import tensorflow_probability as tfp"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "W9ec_JW02g-Q"
      },
      "source": [
        "## BFGS and L-BFGS Optimizers\n",
        "\n",
        "Quasi Newton methods are a class of popular first order optimization algorithm. These methods use a positive definite approximation to the exact Hessian to find the search direction.\n",
        "\n",
        "The Broyden-Fletcher-Goldfarb-Shanno\n",
        "algorithm ([BFGS](https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm)) is a specific implementation of this general idea. It is applicable and is the method of choice for medium sized problems\n",
        "where the gradient is continuous everywhere (e.g. linear regression with an $L_2$ penalty).\n",
        "\n",
        "[L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) is a limited-memory version of BFGS that is useful for solving larger problems whose Hessian matrices cannot be computed at a reasonable cost or are not sparse. Instead of storing fully dense $n \\times n$ approximations of Hessian matrices, they only save a few vectors of length $n$ that represent these approximations implicitly.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "Tm6BS93hQ9Ym"
      },
      "outputs": [],
      "source": [
        "#@title Helper functions\n",
        "\n",
        "CACHE_DIR = os.path.join(os.sep, 'tmp', 'datasets')\n",
        "\n",
        "\n",
        "def make_val_and_grad_fn(value_fn):\n",
        "  @functools.wraps(value_fn)\n",
        "  def val_and_grad(x):\n",
        "    return tfp.math.value_and_gradient(value_fn, x)\n",
        "  return val_and_grad\n",
        "\n",
        "\n",
        "@contextlib.contextmanager\n",
        "def timed_execution():\n",
        "  t0 = time.time()\n",
        "  yield\n",
        "  dt = time.time() - t0\n",
        "  print('Evaluation took: %f seconds' % dt)\n",
        "\n",
        "\n",
        "def np_value(tensor):\n",
        "  \"\"\"Get numpy value out of possibly nested tuple of tensors.\"\"\"\n",
        "  if isinstance(tensor, tuple):\n",
        "    return type(tensor)(*(np_value(t) for t in tensor))\n",
        "  else:\n",
        "    return tensor.numpy()\n",
        "\n",
        "def run(optimizer):\n",
        "  \"\"\"Run an optimizer and measure it's evaluation time.\"\"\"\n",
        "  optimizer()  # Warmup.\n",
        "  with timed_execution():\n",
        "    result = optimizer()\n",
        "  return np_value(result)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ZF7iEPlp7FkN"
      },
      "source": [
        "### L-BFGS on  a simple quadratic function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 104
        },
        "colab_type": "code",
        "id": "HJdXP5E828aP",
        "outputId": "0151b604-0ffc-44bf-a1d0-93c6125cca70"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.014586 seconds\n",
            "L-BFGS Results\n",
            "Converged: True\n",
            "Location of the minimum: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
            "Number of iterations: 10\n"
          ]
        }
      ],
      "source": [
        "# Fix numpy seed for reproducibility\n",
        "np.random.seed(12345)\n",
        "\n",
        "# The objective must be supplied as a function that takes a single\n",
        "# (Tensor) argument and returns a tuple. The first component of the\n",
        "# tuple is the value of the objective at the supplied point and the\n",
        "# second value is the gradient at the supplied point. The value must\n",
        "# be a scalar and the gradient must have the same shape as the\n",
        "# supplied argument.\n",
        "\n",
        "# The `make_val_and_grad_fn` decorator helps transforming a function\n",
        "# returning the objective value into one that returns both the gradient\n",
        "# and the value. It also works for both eager and graph mode.\n",
        "\n",
        "dim = 10\n",
        "minimum = np.ones([dim])\n",
        "scales = np.exp(np.random.randn(dim))\n",
        "\n",
        "@make_val_and_grad_fn\n",
        "def quadratic(x):\n",
        "  return tf.reduce_sum(scales * (x - minimum) ** 2, axis=-1)\n",
        "\n",
        "# The minimization routine also requires you to supply an initial\n",
        "# starting point for the search. For this example we choose a random\n",
        "# starting point.\n",
        "start = np.random.randn(dim)\n",
        "\n",
        "# Finally an optional argument called tolerance let's you choose the\n",
        "# stopping point of the search. The tolerance specifies the maximum\n",
        "# (supremum) norm of the gradient vector at which the algorithm terminates.\n",
        "# If you don't have a specific need for higher or lower accuracy, leaving\n",
        "# this parameter unspecified (and hence using the default value of 1e-8)\n",
        "# should be good enough.\n",
        "tolerance = 1e-10\n",
        "\n",
        "@tf.function\n",
        "def quadratic_with_lbfgs():\n",
        "  return tfp.optimizer.lbfgs_minimize(\n",
        "    quadratic,\n",
        "    initial_position=tf.constant(start),\n",
        "    tolerance=tolerance)\n",
        "\n",
        "results = run(quadratic_with_lbfgs)\n",
        "\n",
        "# The optimization results contain multiple pieces of information. The most\n",
        "# important fields are: 'converged' and 'position'.\n",
        "# Converged is a boolean scalar tensor. As the name implies, it indicates\n",
        "# whether the norm of the gradient at the final point was within tolerance.\n",
        "# Position is the location of the minimum found. It is important to check\n",
        "# that converged is True before using the value of the position.\n",
        "\n",
        "print('L-BFGS Results')\n",
        "print('Converged:', results.converged)\n",
        "print('Location of the minimum:', results.position)\n",
        "print('Number of iterations:', results.num_iterations)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2dnp7Nm161KY"
      },
      "source": [
        "### Same problem with BFGS"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 104
        },
        "colab_type": "code",
        "id": "e1k6n3_n4W2K",
        "outputId": "c34b4ccc-6b2d-44e4-ef47-aebd1ea37ec8"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.010468 seconds\n",
            "BFGS Results\n",
            "Converged: True\n",
            "Location of the minimum: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
            "Number of iterations: 10\n"
          ]
        }
      ],
      "source": [
        "@tf.function\n",
        "def quadratic_with_bfgs():\n",
        "  return tfp.optimizer.bfgs_minimize(\n",
        "    quadratic,\n",
        "    initial_position=tf.constant(start),\n",
        "    tolerance=tolerance)\n",
        "\n",
        "results = run(quadratic_with_bfgs)\n",
        "\n",
        "print('BFGS Results')\n",
        "print('Converged:', results.converged)\n",
        "print('Location of the minimum:', results.position)\n",
        "print('Number of iterations:', results.num_iterations)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "NT1GJU3s1LAW"
      },
      "source": [
        "## Linear Regression with L1 penalty: Prostate Cancer data\n",
        "\n",
        "Example from the Book: *The Elements of Statistical Learning, Data Mining, Inference, and Prediction* by Trevor Hastie, Robert Tibshirani and Jerome Friedman.\n",
        "\n",
        "Note this is an optimization problem with L1 penalty."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8RLvid3kqi4L"
      },
      "source": [
        "### Obtain dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "colab_type": "code",
        "id": "XlMkFHomqyxu",
        "outputId": "258fc324-7f4b-4eae-fff1-7d1ad3afb31d"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Downloading http://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data to /tmp/datasets/prostate.data.\n"
          ]
        }
      ],
      "source": [
        "def cache_or_download_file(cache_dir, url_base, filename):\n",
        "  \"\"\"Read a cached file or download it.\"\"\"\n",
        "  filepath = os.path.join(cache_dir, filename)\n",
        "  if tf.io.gfile.exists(filepath):\n",
        "    return filepath\n",
        "  if not tf.io.gfile.exists(cache_dir):\n",
        "    tf.io.gfile.makedirs(cache_dir)\n",
        "  url = url_base + filename\n",
        "  print(\"Downloading {url} to {filepath}.\".format(url=url, filepath=filepath))\n",
        "  urllib.request.urlretrieve(url, filepath)\n",
        "  return filepath\n",
        "\n",
        "def get_prostate_dataset(cache_dir=CACHE_DIR):\n",
        "  \"\"\"Download the prostate dataset and read as Pandas dataframe.\"\"\"\n",
        "  url_base = 'http://web.stanford.edu/~hastie/ElemStatLearn/datasets/'\n",
        "  return pd.read_csv(\n",
        "      cache_or_download_file(cache_dir, url_base, 'prostate.data'),\n",
        "      delim_whitespace=True, index_col=0)\n",
        "\n",
        "prostate_df = get_prostate_dataset()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CY4JVbrZqpZ-"
      },
      "source": [
        "### Problem definition"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "G7d6oBnYFZwh"
      },
      "outputs": [],
      "source": [
        "np.random.seed(12345)\n",
        "\n",
        "feature_names = ['lcavol', 'lweight',\t'age',\t'lbph',\t'svi', 'lcp',\t\n",
        "                 'gleason',\t'pgg45']\n",
        "\n",
        "# Normalize features\n",
        "scalar = preprocessing.StandardScaler()\n",
        "prostate_df[feature_names] = pd.DataFrame(\n",
        "    scalar.fit_transform(\n",
        "        prostate_df[feature_names].astype('float64')))\n",
        "\n",
        "# select training set\n",
        "prostate_df_train = prostate_df[prostate_df.train == 'T'] \n",
        "\n",
        "# Select features and labels \n",
        "features = prostate_df_train[feature_names]\n",
        "labels =  prostate_df_train[['lpsa']]\n",
        "\n",
        "# Create tensors\n",
        "feat = tf.constant(features.values, dtype=tf.float64)\n",
        "lab = tf.constant(labels.values, dtype=tf.float64)\n",
        "\n",
        "dtype = feat.dtype\n",
        "\n",
        "regularization = 0 # regularization parameter\n",
        "dim = 8 # number of features\n",
        "\n",
        "# We pick a random starting point for the search\n",
        "start = np.random.randn(dim + 1)\n",
        "\n",
        "def regression_loss(params):\n",
        "  \"\"\"Compute loss for linear regression model with L1 penalty\n",
        "\n",
        "  Args:\n",
        "    params: A real tensor of shape [dim + 1]. The zeroth component\n",
        "      is the intercept term and the rest of the components are the\n",
        "      beta coefficients.\n",
        "\n",
        "  Returns:\n",
        "    The mean square error loss including L1 penalty.\n",
        "  \"\"\"\n",
        "  params = tf.squeeze(params)\n",
        "  intercept, beta  = params[0], params[1:]\n",
        "  pred = tf.matmul(feat, tf.expand_dims(beta, axis=-1)) + intercept\n",
        "  mse_loss = tf.reduce_sum(\n",
        "      tf.cast(\n",
        "        tf.losses.mean_squared_error(y_true=lab, y_pred=pred), tf.float64))\n",
        "  l1_penalty = regularization * tf.reduce_sum(tf.abs(beta))\n",
        "  total_loss = mse_loss + l1_penalty\n",
        "  return total_loss"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ZyRl2uksnAY0"
      },
      "source": [
        "### Solving with L-BFGS\n",
        "\n",
        "Fit using L-BFGS. Even though the L1 penalty introduces derivative discontinuities, in practice, L-BFGS works quite well still.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 121
        },
        "colab_type": "code",
        "id": "sXkJbrYVqNSW",
        "outputId": "24058e94-5001-4e47-8758-36d9498d9b50"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.017987 seconds\n",
            "L-BFGS Results\n",
            "Converged: True\n",
            "Intercept: Fitted (2.3879985744556484)\n",
            "Beta:      Fitted [ 0.68626215  0.28193532 -0.17030254  0.10799274  0.33634988 -0.24888523\n",
            "  0.11992237  0.08689026]\n"
          ]
        }
      ],
      "source": [
        "@tf.function\n",
        "def l1_regression_with_lbfgs():\n",
        "  return tfp.optimizer.lbfgs_minimize(\n",
        "    make_val_and_grad_fn(regression_loss),\n",
        "    initial_position=tf.constant(start),\n",
        "    tolerance=1e-8)\n",
        "\n",
        "results = run(l1_regression_with_lbfgs)\n",
        "minimum = results.position\n",
        "fitted_intercept = minimum[0]\n",
        "fitted_beta = minimum[1:]\n",
        "\n",
        "print('L-BFGS Results')\n",
        "print('Converged:', results.converged)\n",
        "print('Intercept: Fitted ({})'.format(fitted_intercept))\n",
        "print('Beta:      Fitted {}'.format(fitted_beta))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "cieJV7D7gIpU"
      },
      "source": [
        "### Solving with Nelder Mead\n",
        "\n",
        "The [Nelder Mead method](https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method) is one of the most popular derivative free minimization methods. This optimizer doesn't use gradient information and makes no assumptions on the differentiability of the target function; it is therefore appropriate for non-smooth objective functions, for example optimization problems with L1 penalty.\n",
        "\n",
        "For an optimization problem in $n$-dimensions it maintains a set of\n",
        "$n+1$ candidate solutions that span a non-degenerate simplex. It successively modifies the simplex based on a set of moves (reflection, expansion, shrinkage and contraction) using the function values at each of the vertices.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 121
        },
        "colab_type": "code",
        "id": "o8Dg-siSdrsV",
        "outputId": "01f8ca8e-75a4-4d22-d4ea-8e272b8b32f1"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.325643 seconds\n",
            "Nelder Mead Results\n",
            "Converged: True\n",
            "Intercept: Fitted (2.387998456121595)\n",
            "Beta:      Fitted [ 0.68626266  0.28193456 -0.17030291  0.10799375  0.33635132 -0.24888703\n",
            "  0.11992244  0.08689023]\n"
          ]
        }
      ],
      "source": [
        "# Nelder mead expects an initial_vertex of shape [n + 1, 1].\n",
        "initial_vertex = tf.expand_dims(tf.constant(start, dtype=dtype), axis=-1)\n",
        "\n",
        "@tf.function\n",
        "def l1_regression_with_nelder_mead():\n",
        "  return tfp.optimizer.nelder_mead_minimize(\n",
        "      regression_loss,\n",
        "      initial_vertex=initial_vertex,\n",
        "      func_tolerance=1e-10,\n",
        "      position_tolerance=1e-10)\n",
        "\n",
        "results = run(l1_regression_with_nelder_mead)\n",
        "minimum = results.position.reshape([-1])\n",
        "fitted_intercept = minimum[0]\n",
        "fitted_beta = minimum[1:]\n",
        "\n",
        "print('Nelder Mead Results')\n",
        "print('Converged:', results.converged)\n",
        "print('Intercept: Fitted ({})'.format(fitted_intercept))\n",
        "print('Beta:      Fitted {}'.format(fitted_beta))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ntLeQCtFZizJ"
      },
      "source": [
        "## Logistic Regression with L2 penalty\n",
        "\n",
        "For this example, we create a synthetic data set for classification and use the L-BFGS optimizer to fit the parameters."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 121
        },
        "colab_type": "code",
        "id": "B_uCJjyDZiVM",
        "outputId": "8f9ec50d-e625-4cca-bab3-51d0324c251e"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.056751 seconds\n",
            "Converged: True\n",
            "Intercept: Fitted (1.4111415084244365), Actual (1.3934058329729904)\n",
            "Beta:\n",
            "\tFitted [-0.18016612  0.53121578 -0.56420632 -0.5336374   2.00499675],\n",
            "\tActual [-0.20470766  0.47894334 -0.51943872 -0.5557303   1.96578057]\n"
          ]
        }
      ],
      "source": [
        "np.random.seed(12345)\n",
        "\n",
        "dim = 5  # The number of features\n",
        "n_obs = 10000  # The number of observations\n",
        "\n",
        "betas = np.random.randn(dim)  # The true beta\n",
        "intercept = np.random.randn()  # The true intercept\n",
        "\n",
        "features = np.random.randn(n_obs, dim)  # The feature matrix\n",
        "probs = sp.special.expit(\n",
        "    np.matmul(features, np.expand_dims(betas, -1)) + intercept)\n",
        "\n",
        "labels = sp.stats.bernoulli.rvs(probs)  # The true labels\n",
        "\n",
        "regularization = 0.8\n",
        "feat = tf.constant(features)\n",
        "lab = tf.constant(labels, dtype=feat.dtype)\n",
        "\n",
        "@make_val_and_grad_fn\n",
        "def negative_log_likelihood(params):\n",
        "  \"\"\"Negative log likelihood for logistic model with L2 penalty\n",
        "\n",
        "  Args:\n",
        "    params: A real tensor of shape [dim + 1]. The zeroth component\n",
        "      is the intercept term and the rest of the components are the\n",
        "      beta coefficients.\n",
        "\n",
        "  Returns:\n",
        "    The negative log likelihood plus the penalty term. \n",
        "  \"\"\"\n",
        "  intercept, beta  = params[0], params[1:]\n",
        "  logit = tf.matmul(feat, tf.expand_dims(beta, -1)) + intercept\n",
        "  log_likelihood = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(\n",
        "      labels=lab, logits=logit))\n",
        "  l2_penalty = regularization * tf.reduce_sum(beta ** 2)\n",
        "  total_loss = log_likelihood + l2_penalty\n",
        "  return total_loss\n",
        "\n",
        "start = np.random.randn(dim + 1)\n",
        "\n",
        "@tf.function\n",
        "def l2_regression_with_lbfgs():\n",
        "  return tfp.optimizer.lbfgs_minimize(\n",
        "      negative_log_likelihood,\n",
        "      initial_position=tf.constant(start),\n",
        "      tolerance=1e-8)\n",
        "\n",
        "results = run(l2_regression_with_lbfgs)\n",
        "minimum = results.position\n",
        "fitted_intercept = minimum[0]\n",
        "fitted_beta = minimum[1:]\n",
        "\n",
        "print('Converged:', results.converged)\n",
        "print('Intercept: Fitted ({}), Actual ({})'.format(fitted_intercept, intercept))\n",
        "print('Beta:\\n\\tFitted {},\\n\\tActual {}'.format(fitted_beta, betas))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "AVbdnfYu8cYh"
      },
      "source": [
        "## Batching support\n",
        "\n",
        "Both BFGS and L-BFGS support batched computation, for example to optimize a single function from many different starting points; or multiple parametric functions from a single point."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "xVAO1lit8zzK"
      },
      "source": [
        "### Single function, multiple starting points\n",
        "\n",
        "Himmelblau's function is a standard optimization test case. The function is given by:\n",
        "\n",
        "$$f(x, y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2$$\n",
        "\n",
        "The function has four minima located at:\n",
        "- (3, 2),\n",
        "- (-2.805118, 3.131312),\n",
        "- (-3.779310, -3.283186),\n",
        "- (3.584428, -1.848126).\n",
        "\n",
        "All these minima may be reached from appropriate starting points.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 121
        },
        "colab_type": "code",
        "id": "6RhP1VON7tO_",
        "outputId": "bb4af95b-dab0-4f79-b5b9-44aa01744571"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 0.019095 seconds\n",
            "Converged: [ True  True  True  True]\n",
            "Minima: [[ 3.          2.        ]\n",
            " [-2.80511809  3.13131252]\n",
            " [-3.77931025 -3.28318599]\n",
            " [ 3.58442834 -1.84812653]]\n"
          ]
        }
      ],
      "source": [
        "# The function to minimize must take as input a tensor of shape [..., n]. In\n",
        "# this n=2 is the size of the domain of the input and [...] are batching\n",
        "# dimensions. The return value must be of shape [...], i.e. a batch of scalars\n",
        "# with the objective value of the function evaluated at each input point.\n",
        "\n",
        "@make_val_and_grad_fn\n",
        "def himmelblau(coord):\n",
        "  x, y = coord[..., 0], coord[..., 1]\n",
        "  return (x * x + y - 11) ** 2 + (x + y * y - 7) ** 2\n",
        "\n",
        "starts = tf.constant([[1, 1],\n",
        "                      [-2, 2],\n",
        "                      [-1, -1],\n",
        "                      [1, -2]], dtype='float64')\n",
        "\n",
        "# The stopping_condition allows to further specify when should the search stop.\n",
        "# The default, tfp.optimizer.converged_all, will proceed until all points have\n",
        "# either converged or failed. There is also a tfp.optimizer.converged_any to\n",
        "# stop as soon as the first point converges, or all have failed.\n",
        "\n",
        "@tf.function\n",
        "def batch_multiple_starts():\n",
        "  return tfp.optimizer.lbfgs_minimize(\n",
        "      himmelblau, initial_position=starts,\n",
        "      stopping_condition=tfp.optimizer.converged_all,\n",
        "      tolerance=1e-8)\n",
        "\n",
        "results = run(batch_multiple_starts)\n",
        "print('Converged:', results.converged)\n",
        "print('Minima:', results.position)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "W73lxbpD_UEs"
      },
      "source": [
        "### Multiple functions\n",
        "\n",
        "For demonstration purposes, in this example we simultaneously optimize a large number of high dimensional randomly generated quadratic bowls."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 69
        },
        "colab_type": "code",
        "id": "yy9sPmO1_3w5",
        "outputId": "afe0255b-b130-42ea-9f3d-eaa07bc51833"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluation took: 1.994132 seconds\n",
            "All converged: True\n",
            "Largest error: 4.4131473142527966e-08\n"
          ]
        }
      ],
      "source": [
        "np.random.seed(12345)\n",
        "\n",
        "dim = 100\n",
        "batches = 500\n",
        "minimum = np.random.randn(batches, dim)\n",
        "scales = np.exp(np.random.randn(batches, dim))\n",
        "\n",
        "@make_val_and_grad_fn\n",
        "def quadratic(x):\n",
        "  return tf.reduce_sum(input_tensor=scales * (x - minimum)**2, axis=-1)\n",
        "\n",
        "# Make all starting points (1, 1, ..., 1). Note not all starting points need\n",
        "# to be the same.\n",
        "start = tf.ones((batches, dim), dtype='float64')\n",
        "\n",
        "@tf.function\n",
        "def batch_multiple_functions():\n",
        "  return tfp.optimizer.lbfgs_minimize(\n",
        "      quadratic, initial_position=start,\n",
        "      stopping_condition=tfp.optimizer.converged_all,\n",
        "      max_iterations=100,\n",
        "      tolerance=1e-8)\n",
        "\n",
        "results = run(batch_multiple_functions)\n",
        "print('All converged:', np.all(results.converged))\n",
        "print('Largest error:', np.max(results.position - minimum))\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "Optimizers in TensorFlow Probability",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
